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We propose two approaches for determining the native basins in off-lattice models of proteins. The 
first of them is based on exploring the saddle points on selected trajectories emerging from the 
native state. In the second approach, the basin size can be determined by monitoring random 
distortions in the shape of the protein around the native state. Both techniques yield the similar 
results. As a byproduct, a simple method to determine the folding temperature is obtained. 
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Chains of beads on the cubic or square lattices, 
with some effective interactions between the beads, 
often serve as simple models of proteins (see for in- 
stance ref. [Q). A more realistic modelling, however, 
requires considering off-lattice systems. Simple off- 
lattice heteroplymers have been discussed recently 
by fori, Marinari, and Parisi Q|, Irback et al. 
Klimov and Thirumalai 0|, and by the present au- 
thors @. The purpose to use such models is to 
understand the basic mechanism of folding to the 
native state. In lattice models, the native state 
is usually non-degenerate and it coincides with the 
ground state of the system. Delineating boundaries 
of the native basin in off-lattice systems, however, 
is difficult, especially when the number of degrees 
of freedom is large , yet it is essential for studies of 
almost all equilibrium and dynamical properties of 
proteins. For instance, stability of a protein is deter- 
mined by estimating the equilibrium probability to 
stay in the native basin: the temperature at which 
this probability is ^ defines the folding temperature, 
Tf. The native basin consists of the native state and 
its immediate neighborhood, as shown schematically 
in Figure 1, and it should not be confused with the 
whole folding funnel. The latter involves a much 
larger set of conformations which are linked kineti- 
cally to the native state. 

In most studies, such as in Ref. ||,^, the size of 
a basin is declared by adopting a reasonable but ad 
hoc cutoff bound. Systematic approaches, however, 
are needed and will be presented here. The task 
of delineating of the native basin is facilitated by 
introducing the concept of a distance between two 
conformations a and 6, 5ab- The distance should be 
defined in a way that excludes effects of an overall 
translation or rotation. There are two definitions of 
5ab, for a sequence of N monomers, that we shall 
use. The first one is 



Sib = miniEtikT-^tP , (1) 
where r* denotes the position of monomer i in con- 
formation a and the minimization is performed over 
translations, rotations and reflections. In practice, 
we put chain a over chain h by overlapping the two 
centers of mass, and then we find the optimal rota- 
tion of h which minimizes Sab- 
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FIG. 1. A schematic plot of the energy in the vicinity 
of the native state. 5^ denotes a characteristic size of the 
basin. 

The second is 

(2) 



The first definition is closer to an intuitive under- 
standing of the distance between shapes whereas 
the second is easier to compute, especially in three- 
dimensional situations. Nevertheless both are ex- 
pected to be physically equivalent. 
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FIG. 2. (a) The target native conformation for se- 
quence G. Its energy is equal to -8.716 in units of C. 
The figure shows also the assignment of the couplings 
Aij used to construct sequence R' . The numbers in- 
dicate the relative strengths of the contacts, (b) The 
native conformation of sequence R' . (c) The distances 5 
from local minima to the native state, for sequence G, 
shown as a function of energy of the minima. The dotted 
line corresponds to the minimal distance Smin ■ The hor- 
izontal arrow indicates the size of the basin boundary, as 
obtained form studies of the shape distortions, (d) The 
same as in c) but for sequence R' . 

In this paper, we develop two techniques for de- 
termining the basin of the native state. In the first 
approach, we generate an image of the phase space 
by sampling it by low energy trajectories that start 
from the native state and continue by displacing the 
monomers in a way that increases the distance away 
from the native state while preserving the connect- 
edness of the chain. For each trajectory, a depen- 
dence of the potential energy on the distance away 
from the native state is obtained and locations of the 
saddle points are determined. The average distance 
to a first encountered saddle point, < 5s >, may be 



considered as characterizing the size of the basin. 

In the second technique, which we find to be more 
statistically reliable and more automatic in its im- 
plementation, we adopt a variant of de Gennes's idea 
of the " ant in a labirynth" . Specifically, we char- 
acterize the geometry of the native basin by moni- 
toring random shape distortions of the heteropoly- 
mer in the basin. The distortions are induced by 
diffusive-like displacements of individual beads. The 
method of the shape distortion is implemented by 
first placing the polymer in an initial conformation, 
a(0), which usually coincides with the native state. 
Subsequently, one performs random displacements 
of individual beads in the chain, through a Monte 
Carlo routine, and the conformation at time t ac- 
quires a shape a{t). The process is characterized 
by determining the evolution of a mean square dis- 
tance, < (5^ >t = < '5^(o)a(t) between a(t) and 
a(0). The focus is on short time behavior and the 
average is over many trajectories that start from the 
same a(0). The characteristic size of the basin, 5c 
is obtained by studying features in < (5^ >t as de- 
scribed later. In order to scale the walls of the basin 
one needs to make the polymer 'crawl' up these walls 
without involving any kinetic energy. Thus the pro- 
cess does not correspond to any 'real' evolution in 
time, as defined e.g. through the molecular dynam- 
ics. As opposed to the first technique, the fiuctua- 
tions in the shape of the polymer are understood to 
be due to coupling to a heat bath. 
Models 

In order to illustrate the two techniques, we con- 
sider a two-dimensional version of the model intro- 
duced by lori, Marinari and Parisi ||^. The Hamil- 
tonian is given by 

H = - do)H,,,+, + 4[-^ - ^]} , 

(3) 

where i and j range from 1 to the number of beads, 
N, which in our model is equal to 16. The distance 
between the beads, dij is defined as \ri — rj\, where 
ri denotes the position of bead i, and is measured 
in units of the standard Lennard- Jones length pa- 
rameter a. The harmonic term in the Hamiltonian 
couples the adjacent beads along the chain. The re- 
maining terms represent the Lennard- Jones poten- 
tial. In j2| Aij is chosen as Aij ^ Aq + ^/erjij , where 
^0 is constant, ?7y 's are Gaussian variables with zero 
mean and unit variance; e controls the strength of 
the quenched disorder. The case of rjij — and 
Aq ^ C would correspond to a homopolymer. Our 
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choice for the values of Aij is that all of the Aij's 
are positive, which corresponds to attraction. We 
measure the energy in units of C and consider k to 
be equal to 25 in units oi C /a^. Smaller values of k 
may violate the self-avoidence of the chain 
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FIG. 3. Typical trajectories from the native state gen- 
erated by the first method. The solid and dotted arrows 
denote the basin sizes < 5a > and 5c obtained by the 
first and second approaches respectively. 

We focus on two 16-monomer sequences in two 
dimensions which were characterized in detail pre- 
viously One of them, G, is a good folder and 
the other, i?', is a bad folder. We used two criteria 
for the quality of folding: 1) based on the evaluation 
of the specific heat and structural susceptibitity j|] 
and 2) based on the location of the folding temper- 
ature, Ty, relative to the temperature at which the 
onset of the glassy effects takes place [|j . G has been 
designed as an off-lattice Go-like sequence |10 and 



the Aij 's are taken to be 1 for native contacts and 
for non-native ones. The target conformation is on 
lattice and is shown in Figure la. This target is the 
same as for the lattice sequence R of Ref. 
The ground state of G has approximately the shape 
of the target - note that the interaction between two 
beads forming a contact also slightly affects other 
beads. The bad folder R' is constructed following 



the rank-ordering procedure which is an off-lattice 
analog of what was done in references We 
choose the equlibrium interbead distance, do of 2^/^ 
and 1.16 (the latter value is determined by the av- 
erage of Aij over all couplings, see Ref. |^) for G 
and R' respectively. The target conformation is cho- 
sen initially to be the same as in Figure la. The 
most strongly attractive Lennard- Jones interactions 
are assigned to the nine native contacts. They are 
enhanced by making the corresponding Aij bigger 
than one. The remaining couplings have A^j's which 
are smaller than 1. Rank ordering of the contacts 
generates good folders among lattice models. In the 
off-lattice models, however, the non-native Lennard- 
Jones interactions, due to their long range nature, 
overconstrain the system and frustrate it leading to 
a deterioration of folding properties. The values of 
Aij for R' are listed in Ref. . The resulting native 
state of R' is shown in Figure lb. 

Figures 2c and 2d show the distributions of dis- 
tances 5 for local energy minima of sequences G 
and R' respectively. The native states and the lo- 
cal energy minima have been obtained by multiple 
quenches from random conformations. Note that the 
sequence G has a much fewer number of the local 
energy minima compared to sequence R' and the 
energy gap to the first excited local minimum is sig- 
nificantly larger. In each case, there is one minimum 
which has the closest geometrical distance, (5mm to 
the native state. 5min is thus an upper bound for 
the size of the native basin (0.5 for G and 0.3 for 
R'). 

Trajectories in the energy landscape 

The first technique is implemented by creating the 
trajectories from the native state at T = in a 
stochastic way. The bead positions are displaced 
randomly and the moves are accepted if they in- 
crease the geometrical distance to the native state 
and keep the distance between nearest beads to be 
in the interval 1 < < l.ldo- In addition one 

has to keep the interaction energy for any pair of 
monomers in sequence R' to be negative. For G, 
however, the non-native pairs interact only repul- 
sively. In order to minimize the repulsion as much 
as possible we explore only those trajectories which 
keep the distances between the monomers of non- 
native contacts sufficiently large. We choose the 
minimal distance between the beads of non-native 
contacts, dm, to be dm = 1-5 (the choice of 1.6 for dm 
yields similar results) and reject trajectories which 
lead to a monotonic increase in energy even after 
entering into the region of overall positive energies. 
Smaller values of dm usually lead to trajectories with 
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unreasonably large positive energies. Substantially 
larger values of dm generate short trajectories which 
terminate on a conformation which does not allow 
for a further increase in the distance. 

Figure 3 shows typical trajectories for G and R' . 
For each trajectory, we define the postion of the 
saddle point 6s ■ This point shows as a local max- 
imum on the energy vs. 6 curve. The average value, 

< 6s >, defines the basin size. Averaging over 50 
trajectories we obtain < 6s >— 0.17 ± 0.02 and 
0.12 ± 0.01 for sequences G and R' respectively. 

It should be noted that the technique described 
here is simple conceptually and easy to implement 
for a small number of trajectories but its resulting 
statistics on the basin size are inherently poor since 
there is no convenient control over the choice of im- 
portant trajectories. Following patterns in the force 
field deterministically is a possible improvement but 
another stochastic approach, described below, com- 
bines simplicity with reliability of the results. 
Fluctuations in the shape of the polymer 

In order to compute < 6^ >t we update the 
monomer positions randomly within circles of radius 
of 0.01 (the choice of 0.02 yield similar results). We 
assume that the system is in contact with a heat 
bath corresponding to temperature T which provides 
a controlling device. Figure 4 shows the dependence 
of < 6^ >t on t for sequence G. We observe that 
there are, in general, three regimes of behavior of 

< >t. For sequence G, all of the three regimes 
appear below Tc— 0.19 and are shown, in Figure 4, 
for T=0.05 and 0.15. 

In the first regime, I, corresponding to very short 
time scales during which merely several percents of a 
linear size of the basin are covered, one has a power 
law 

<6^>t - t"" . (4) 

Pliszka and Marinari jl^ have demonstrated that 
Vf) is sensitive to the details of the Hamiltonian. We 
find that vq is equal to 0.96 ± 0.02 and 0.94 ± 0.02 
for sequences G and R' respectively. Both values are 
close to 1 and stay the same for all T's (even for T 
up to 50) which suggests a simple diffusive behavior. 

In the second regime, II, the plot of < 6^ >t vs. t 
acquires a T-dependence. Sommelius Q has focused 
on this regime and has postulated that the behavior 
here appears to follow a power law, at least approxi- 
mately. The corresponding exponent v is then found 
to depend on T in a way that relates to characteristic 
temperatures of the system. We have found, how- 
ever, that this power law is not robust - the effective 
exponent depends on the size of the steps in which 



one implements distortions. More importantly, the 
spatial extent of this regime is necessarily limited, 
and it would probably remain so even for very long 
heteropolymers . 
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FIG. 4. The dependence of the square of the distance 
to the native state on the Monte Carlo time for G for sev- 
eral values of the temperature which are shown next to 
the curves. The dashed, solid and dotted correspond to 
regimes I, II, and III respectively. The results are av- 
eraged over 400 trajectories. The error bars are smaller 
than the symbol sizes. The optimal rotation, required in 
Eq. (1), is picked from 1000 discrete values into which 
the full 360° angle is divided. 

In the third regime. III, observed only below a 
critical temperature Tc, < (5^ >t saturates at a con- 
stant value as explained in detail in Ref. Above 
Tc, the shape distortion ceases to be confined to the 
native basin and the type II behavior continues to 
take place, as illustrated by the data points corre- 
sponding to T = 0.45 in Figure 4. The limiting 
saturation value of v<6^> at Tc defines a charac- 
teristic basin size, 6c, that can be used, e.g., when 
deciding if a folding took place if the system started 
to evolve from and unfolded state. Naturally, Tc is a 
measure of the folding temperature Tf which char- 
acterizes thermodynamic stability of the system. 

Figure 5 switches from the logarithmic scale of 
Figure 4 to the linear scale in wich the transition 
between the staturation and lack of saturation in 
< >t shows in a more convincing way. Figure 5 
compares the time dependence oi < 6^ >t to that 
of < 6''^ >t for G at Tc and demonstrates that the 
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actual choice of the definition of the distance has 
smah effect on the results. For sequence G we have 
5c = 0.2 ±0.02 (the second definition of the distance 
yields 5'^ = 0.17 ± 0.02) For sequence R' we find 
that Tc « 0.09 and 5c = 0.09 ± 0.02. Within the 
error bars, 5c is close to < 5s > defined by the first 
approach which is also indicated in Figure 3. For 
both sequences, the values of 5c and < i5s > are 
smaller than 5min ■ Thus the saturation value of the 
distance to the native state at T ~ Tc may indeed 
serve as a measure of size of the native basin 5c, i.e 
5c = [< 5^ >sat {T = Tc)]i/2 determines the true 
boundary of the native basin. 
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FIG. 5. The time dependence of 5 5^ >\^'^ 

(black hexagons) and 5' =< J'^ >t''^ (open hexagons) 
for sequence G at the critical temperature Tc=0.19. For 
T = 0.25, 5 (black squares) does not saturate below the 
the distance equal to the minimal distance, 5mm = 0.5, 
between the native state and the local minimum which 
is the nearest geometrically (see Figure 2). 



tortion. It should be noted that the calculation of Tc 
by monitoring stochastic shape distortions is signifi- 
cantly less involving computationally. We have also 
used this technique to determine the sizes of basins 
of low lying local energy minima. The correspond- 
ing values of 5c are found to be smaller than for the 
native state. 

We now consider, following Struglia |Q, the basin 
of attraction for random initial conformations which 
are subsequently quenched in the steepest descent 
fashion. The basin of attraction is defined in terms 
of a distance at which the probability to fall onto the 
native state is bigger than or equal to Pc- The cor- 
responding basin size will be denoted by i5/. In Ref. 

, -pc was taken to be equal to ^ . In our studies, we 
took = 1 and determined the basin of attraction 
for sequences G and B! through a standard quench- 
ing procedure. We considered 200 trajectories and 
obtained 5f k, 0.55 and « 0.35 for sequences G and 
B! respectively. These values exceed not only our 
estimate of 5c but also that of the minimal distance 
between the native state and the nearest minimum 
5ram- The values of 5/ would become even larger 
for a "Pc that was less than 1. This emphasizes the 
point that the procedure used by Struglia probably 
delineates the folding funnel and not the native basin 
itself. 

In summary, we have explored the native basins 
of two off-lattice sequences by monitoring the shape 
distortion and by exploring the saddle points of the 
trajectories from the native state. We have come 
out with computationally simple methods to delin- 
eate the boundaries of the basins and to estimate 
the folding temperature. The bad and good folders 
are found to have native basins which are compara- 
ble in size even though the structure of their folding 
funnels must be very different. 

We thank T. X. Hoang for discussions. This 
work was supported by Komitet Badan Naukowych 
(Poland; Grant number 2P03B-025-13). 



The fact that 5c for G is found to be bigger than 
for B! indicates bigger stability of G relative to B! . 
This also correlates well with the higher value of Tc 
which in turn suggests that Tc is a measure of Tf - 
the folding temperature. This interpretation is con- 
firmed by comparing the values of Tc to Tf obtained 
by calculating the probability to be within the cutoff 
distance, 5ci away from the native state Q and by 
studying positions of the peaks in the structural sus- 
ceptibility [§. These studies yield T/ of 0.24 ± 0.03 
and «0.12 for G and B! respectively. Both of these 
values are close to Tc obtained from the shape dis- 
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